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Abstract 

The dense deployment of small-cell base stations in HetSNets requires efficient resource 
allocation techniques. More precisely, the problem of associating users to SBSs must be revised and 
carefully studied. This problem is NP-hard and requires solving an integer optimization problem. 
In order to efficiently solve this problem, we model it using non-cooperative game theory. First, 
we design two non-cooperative games to solve the problem and show the existence of pure Nash 
equilibria (PNE) in both games. These equilibria are shown to be far from the social optimum. 
Hence, we propose a better game design in order to approach this optimum. This new game 
is proved to have no PNE in general. However, simulations show, for Rayleigh fading channels, 
that a PNE always exists for all instances of the game. In addition, we show that its prices of 
anarchy and stability are close to one. We propose a best response dynamics (BRD) algorithm 
that converges to a PNE when it exists. Because of the high information exchange of BRD, a 
completely distributed algorithm, based on the theory of learning, is proposed. Simulations show 
that this algorithm has tight-to-optimal performance and further it converges to a PNE (when 
existing) with high probability. 


Index Terms 

User-BS association, game theory, pure Nash equilibrium, distributed learning algorithms. 


This work has been partly presented at IEEE ICC, London, UK, June 2015. 
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I. Introduction 

The unprecedented growth of mobile data traffic gives rise to many challenges in today’s 
cellular networks. Hence, novel network architectures and resource allocation solutions are 
needed in order to deal with this exponential growth. Cellular networks have shifted from 
the deployment of traditional and expensive high-power base stations (BSs) towards the 
deployment of heterogeneous low-power BSs including small-cell BSs (SBSs) Q. This led to 
the appearance of heterogeneous and small-cell networks (HetSNets). In HetSNets, different 
heterogeneous elements coexist such as distributed antenna systems, relay nodes, and SBSs 
including pico-cell BSs (BBSs), femto-cell BSs (FBSs), etc. The deployment of SBSs is 
emerging as a key solution to the increasing demand for reliable and high speed wireless 
access [^. Their adoption is motivated by several factors such as their ease of deployment 
and low cost of installation and maintenance. They are regarded as the ideal candidate for 
future generation of cellular network in order to enable better coverage and achieve higher 
data rates |^. 

In HetSNets, two kinds of interference arise, namely cross-tier interference and co-tier 
interference j^. Whereas the former kind can be managed using the spectrum splitting 
approach |^, where macro-cell BSs (MBSs) and SBSs are allocated different portions of 
the spectrum, the co-tier interference is very difficult to control especially when there is no 
communication between the deployed SBSs. To better manage the co-tier interference, 
several techniques are introduced in the literature such as power allocation, spectrum 
allocation, and user-BS association [^. In this paper, we are interested in the user-BS 
association problem. Roughly speaking, we define this problem as follows: given a set of 
users, a set of BSs, a threshold value and a channel gain between every pair of user-BS, find 
a one-to-one association of the users to the BSs such that every association has a signal- 
to-interference-plus-noise ratio (SINK) greater than the threshold value. In our previous 
work, we formulated this problem and proved that it is NP-hard ||^. In this paper, we are 
interested in solving it distributively in HetSNets. Hence, we propose to model it using 
non-cooperative game theory. Moreover, we design two strategy updating algorithms which 
approach the highly complex centralized user-BS association. The variety of the proposed 
techniques in the literature either do not use game theory or do not propose a completely 
distributed algorithms for such problem. This motivates the design of efficient game models 
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and completely distributed user-BS association mechanisms in HetSNets. 

Related work can be divided into; (i) centralized solutions where the decision lies on 
a central coordinator (7]-[l0]; (ii) distributed solutions with high amount of information 
exchange and (iii) applications of learning algorithms for HetSNets. In the sequel, 

we present the most recent related work on these directions. 

In the authors study the resource allocation as a joint optimization problem of 
channel allocation, user-BS association, beam-forming and power control in HetSNets. It 
is solved using an iterative algorithm based on £^-norm heuristics. This work maximizes 
the total up-link throughput and guarantees the QoS of users. Even though, the work 
shows that the relaxation of the combinatorial problem to a continuous one provides the 
optimal solution, the proof lacks of generality and it depends on the problem formulation. 
In [^, the joint power allocation and user-BS association is modeled as a combinatorial 
optimization problem. The authors use Bender’s decomposition to solve the modeled problem 
optimally and they propose heuristic algorithms. However, the proposed optimal method 
and the heuristic algorithms are highly complex. In j^, the joint user-BS association and 
power control problem in HetSNets is considered. The joint problem is formulated as an 
optimization problem where the objective is to maximize the minimum SINK subject to a 
power constraint at each BS. This problem is shown to be NP-hard and heuristic algorithms 
are proposed to solve it. Note that the differences with our work are threefold. First, the 
paper solves the user-BS association and the power control in a centralized fashion. Second, 
the system model allows multiple users to be associated to one BS. Third, the problem does 
not guarantee a minimum quality of service (QoS) to the associated users. Reference 
considers the problem of user-BS association and spectrum allocation in HetSNets. The paper 
adopts stochastic geometry to derive the theoretical mean utility based on the coverage rate. 
The user-BS association is performed based on the biased received power and the bias factors 
are obtained analytically using the cell range expansion scheme. 


In 11 , the authors solves the joint problem of user-BS association and resource allo¬ 
cation in orthogonal-frequency-division-multiple-access (OFDMA) networks. The problem 
is formulated as a weighted sum rate maximization problem and is shown to be NP- 
hard. Next, based on mechanism design, the joint problem of user-BS association and 
resource allocation is modeled using non-cooperative game and solved using Vickrey-Clarke- 
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Groves (VCG) mechanism. The main differences between this work and ours are twofold. 
First, the optimization objective is a weighted sum rate which does not depend on the 
interference of the associated users, i.e., because of the OFDMA system model, there is no 
interference and therefore the rate is a function of the signal-to-noise ratio (SNR). Second, 
the user-BS association game has a continuous utility function (the weighted rate of the 
user) which allows the authors to use the theory of super-modularity and complementarity 
whereas we consider a discontinuous utility function and different system model. The user- 
BS association problem is solved in jointly for fairness and load balancing. The load 
of the SBSs is balanced by using a distributed algorithm based on the technique of dual 
decomposition. This work solves the user-BS association problem based on relaxation and 
rounding techniques which remove the combinatorial nature of the problem and render it 
easier to solve. Reference solves the user-BS association in HetSNets based on a pricing 
scheme. The user-BS association is solved based on a Lagrangian dual analysis and a dual 
coordinate descent method is proposed. The paper also extends the problem to the multiple- 
input-multiple-output (MIMO) case and optimizes the beam-forming variables. Anyhow, the 
proposed distributed algorithm is not based on game theory. Both [I^ and focus on 
the same system model which is very different from the system model of this paper. 

The related work that investigate the application of learning algorithms in HetSNets focus 


on other types of problems such as power allocation and link activation [14 17 . In [I4j, 


the authors formulate the link activation problem as a non-cooperative game and study its 
convergence to a mixed NE. The transmitter-receiver links are already established whereas 
the problem is to determine which links are to be simultaneously activated. The same 
system model is considered in where the author designed no-regret algorithms that 
converge to sub-optimal performance. Note that once again, the tackled problem in this 
work is power allocation and the links are pre-established. The work in makes use of 
a recent learning paradigm called interactive trial and error in order to design completely 
distributed algorithms for the power allocation problem. The authors prove the convergence 
of their solution to an epsilon-NE. In the context of femto-cell networks, many learning 
algorithms were proposed in the literature. For instance, the authors in propose and 
compare two learning mechanisms based on Q-learning and evolutionary game theory. 

The key contributions of this paper are the following: 
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• We model the user-BS association problem in a HetSNet as non-cooperative repeated 
games. 

• We investigate the pure Nash equilibrium (PNE) as a solution concept for the formulated 
games and we prove, using potential game and best response update, that these games 
admit PNEs. 

• We show, using simulations, that the proposed games have low performance compared 
to the social optimum. 

• To overcome these shortcomings, we propose a better game design. This new game 
is shown to have no PNE in general however. Despite this, when the reformulated 
game admits PNEs, these PNEs are very efficient. In other words, the price of stability 
(PoS) and the price of anarchy (PoA) of the game are shown to be very close to one. 
Furthermore, using simulations, it is observed that the game always admits PNEs for 
Rayleigh fading channels. 

• In order to solve the formulated game, we propose two strategy updating algorithms. 
The hrst is the best response dynamics (BRD) algorithm. The second is a com¬ 
pletely distributed algorithm, inspired by the well-known learning rule win-stay-lose- 


shift (WSLS) (^, 19 and is called modihed mWSLS. 

• We show that mWSLS can be implemented in a completely distributed manner and 
that it has tight-to-optimal performance as well. Furthermore, the efficiency of mWSLS 
is shown to converge to a PNE (when it exists) with high probability. 

The rest of the paper is organized as follows: Section presents the system model 
and formulates the integer programming problem. Section uni hrst formulates the user-BS 
association problem as two non-cooperative games and shows the existence of PNEs in 


both games. Second, it designs a better game and analyzes it. Section IV discusses the 
performance and the efficiency of the PNEs. Next, Section |V] describes the proposed best 


response dynamics (BRD) algorithm and Section VI proposes a completely distributed 


algorithm. Then, Section VII presents the simulation results and hnally Section VIII draws 
some conclusions. 
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II. System Model 

This paper considers a HetSNet composed of a macro-cell base station (MBS), N SBSs and 
M small-cell nsers (SUs). The set of SUs and the set of SBSs are denoted by = {1,..., M} 
and M = respectively. The SBSs are assnmed to have no cognitive capabilities 

and hence cannot perform spectrnm sensing in order to avoid interference to and from 
the MBS transmissions. Therefore, the spectrnm splitting approach [^, where the SBSs are 
transmitting over a portion of the MBS spectrum whereas the remaining part of the spectrum 
is exclusively dedicated to the macro-cell users (MUs), is assumed. Thus, there is no cross-tier 
interference between the MBS and the SBS transmissions. However, SBSs transmitting at 
the same time will suffer from co-tier interference. The down-link transmission is considered 
where each available SBS can transmit to a SU. Every transceiver in the network is equipped 
with a single antenna and one SBS is transmitting to one and only one SU at any given 
time. An example of the system model is given in Fig. [Tj 



When a SBS n G A/" is transmitting to a SU m G Al, it uses a fixed power and it 
establishes a link ^rnn■ The channel gain over link for all n G A/^ and for all m G Al, 
is modeled as a random variable gmn which takes into account the short-term path loss 
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propagation effect and the long-term fading effect. Further, time is divided into time-slots 
and the channel gains are fixed during one time-slot. The received SINK of the established 
link Imn is given by: 




Vn ■ |gr, 


+ Y. Vn' ■ Igr, 




n'&T’ 


( 1 ) 


where = T \ {n}, T is the set of transmitting SBSs during the current time-slot, and 
cr^ is the variance of the additive white Gaussian noise (AWGN). 

In this work, the main objective is to maximize the number of associated SUs in each 
time-slot. This objective has to be met subject to the constraint of satisfying a system level 
performance, namely the QoS of associated SUs which has to be kept above a specified 
threshold. The QoS constraint is expressed using the SINK of each established link. The 
optimization problem is formulated in the next subsection. 


A. Gentralized Solution 

The user-BS association problem can be formulated as an integer programming prob¬ 
lem [^. The objective is to maximize the total number of associated SUs in the HetSNet 
subject to the SINK thresholds constraints at the SUs. 

The problem can be formulated as follows: 


maximize 

X 


(2a) 

subject to 

^ ^ ^mn ^1) W n ^ A/", 

(2b) 


^ Xmn ^ 1, V m G AT, 

n&M 

(2c) 


Tmn (x) ^ /3 • Xmn, m G M, V U G A/", 

(2d) 


Xmn G {0,1} , V m G AT, V n G A/". 

(2e) 


The optimization variables are given by the matrix x = 


[xmn]- Gonstraints (2b) 


ensure 


that a SBS associates to one SU whereas constraints (2c) ensure that a SU is associated 
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with one SBS. Constraints (2d) guarantee that an associated SU-SBS must have an SINK 


above the threshold (3. Finally, constraints (2e) ensures that association variables Xmn are 


Boolean. Note that (2d) represents the SINK of the corresponding link f'mn and is given by: 


(x) = 


Pn ■ Ig,. 


■ Xr. 


X!] X!] P’n-' ' Igmn'P ' Xm'n' 

m'&M' n'eJV' 


( 3 ) 


where M' = M \ {m}, TV' = TV \ {n}. 

We proved in that problem (|^ is NP-hard. Therefore, its optimal solution cannot 
be found in polynomial-time unless P = NP. In this paper, problem ([^ is solved using a 
branch-and-bound algorithm implemented in the CPLEX solver This algorithm gives 
the optimal solution for reasonable values of M and N and therefore it is used in the 
subsequent sections as a benchmark solution. 


A. Games Formulation 


III. User-BS Association Games 


This section formulates the user-BS association problem using non-cooperative game 
theory. First, we propose a simple game model which is shown to admit PNEs. Second, 
we modify this game by restricting the players to not play a collision; that is, none of the 
players chooses the same action. 

The first user-BS association game is given by 0i = {Af, {BnjneAf, {vn}neAf) where: 

• TV is the set of players, i.e., the SBS^ 

• {Bn}n£M is the set of actions available for player n and is given hy Bn = Ai. The 
actions of all the players is given by the Cartesian product B = Bi x B2 x ■■■ x Bn- 
The vector a = (ai,... ,aAr) G B denotes an action profile of 0i, where a^ G Bn', and 

• {wlnsAt is the set of payoffs of the players. The payoff of player n is given by the 
function Vn '■ B ^ {“Ij !}• 

According to these notations, we rewrite the SINK expression in ([^ as follows: 


SINR,,„Va) = SINR« 


\ def 

I ^—n )— 


Pn ■ |ga 


+ Y. Pn' ■ |ga 




( 4 ) 


n'eSn 


IWe use SBSs (resp. SUs) and players (resp. users) interchangeably. 
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where = {n' G TV : n' 7^ n}, a is an action profile in B, a_„ is the action profile a where 
player’s n action is dropped, i.e., a_„ = (ai, ■ ■ ■ , an-i, a„+i, ■ ■ ■ , aiv). 

The payoff fnnction of player n, Vn{-), is given by: 


t’n(a) 


/ \ det 

n) 



if SINRa„„(a„,a_„) </T 
if SINRa„„(a„,a_„) 


The following definitions are nsed in the rest of the paper. 


( 5 ) 


Definition 1 (Pnre Nash Eqnilibrinm); A PNE is an action profile a* = (a*,al^) snch that 
for all a^An, Un (a^, al„) ^ (a'„, al„). 

Definition 2 (Potential Game): A game is called a potential game if the incentive of all 
players to change their strategy can be expressed nsing a single global fnnction called the 
potential fnnction. 


Definition 3 (Potential Fnnction): A fnnction <h : ^ i—)■ R is an exact potential fnnction if 
for every n E Af and for every a_„ G A-n- 

n) ^n(bn) a_^) •h(a^, a._A) *h(b,2; n) 

= A$ 

for every an,bn G 

Proposition 1 The game 0i admits at least one PNE. 


Proof: We prove Proposition by showing that 0i is a potential game. Let : B = 


x^^iBn H->■ Z be the fnnction defined as follows: 

N 


*^1 (^-ri) n) ^ n)- 


2 = 1 


( 6 ) 
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It is easy to see that: 

N 


N 


A<I>i = ^i;i(a„,a_„) - ^i;i(b„,a_ 


2 = 1 


2=1 


N 


N 


1^72(^225 ^—22) ^ ^ ^2(^225 ^—22) "^22(^725 ^—22) 

2=1,27^22 


t^2(br2,a_ 

2 = 1 , 27^22 


(7) 


N N 

i=l,ijtn i=l,ij^n 

'' -V-' 

=0 

The difference of the two sums in the last line of the right-hand side of Q is equal to zero 
because the change in action of player n from an to bn only affects player n. In fact, if an 
arbitrary SBS n changes its action from an to bn, the only payoff that will change is its 
own payoff since the interference of SBS n to the other SBSs is already there and changing 
the chosen action does not affect the value of the interference. 

According to Definitions and the game 0i is a potential game and hence admits at 
least one PNE ■ 

Note that the formulation of 0i lacks an important issue in the system model which is 
occurrence of the collision, where at least two SBSs choose the same action. In order 
to take into account this issue, we modify 0i to the following game given by 02 = 
(A^, {e„} neA/", {'W^njneA/') where the payoff function of player n, Wn(-), is given as follows: 


^n(^) n) ^ 


—2 if 3n' ^ n : an = an' 

-1 if SINRa„n (an, a_n) < /3 
1 if SINRa„n (an, a_n) ^ /? 


( 8 ) 


According to this formulation, SBSs will try to stay away from collisions in order to selfishly 
improve their performance. 

In game theory, a best response is an action profile that produces the most favorable 


outcome for a player, given other players’ actions 22 . In fact, the concept of best response 
is central to Nash’s theorem. In other words, the intersection of the best responses of every 
player is the set of PNE of a given game. The following definition is useful for the next 
analysis. 
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Definition 4 (Best Response): Let BR(-) be the set of best responses of player n. We have 
the following result: a* G BR (a_„) Va^ G Un (a*, a_„) ^ Un (a„, a_„). 

Dehnition says that the action a* of player n is a better response given the action prohle 
a_„ if there is no other action a„, for player n that do strictly better than a*. When each 
player n plays a best response, we call this process a best response update. 

Proposition 2 The game 02 admits at least one PNE. 

Proof: The proof is to show that the best response update of the SBSs converges to an 
equilibrium point. Assume that we start the best response from an arbitrary action prohle 
a such that Wn(a) G {1,-1,—2}. Two cases can be distinguished: 

1) M ^ N (at least as much actions as players): In this case, SBSs having a payoff of 
—2 change their actions by best responding in order to get —1 or 1. They succeed by 
doing so because there are more actions than players. Hence, at the end of the hrst 
iteration of the best responses, every SBS transmits to a different user and therefore 
there is no collision. Next, SBSs having a payoff of —1 best respond by looking for a 
SU which is idle (not chosen by other SBSs). If M = A^, then there is no idle SU and 
a one-to-one matching exists and none of the SBSs can deviate. Hence, an equilibrium 
is reached. If M > A^, then every SBS has to look for the idle SU and transmit to it 
if possible and an equilibrium is eventually reached. 

2) M < N (more players than actions): In this case, there are some SBSs which are 
transmitting to the same users (collision). Since there are less actions than SBSs, then 
the SBSs which are getting —2 cannot strictly improve their performance by simply 
best responding. Hence, the system is already in an equilibrium. 

This proves that the best response update leads to an equilibrium point where none of the 
SBSs has the incentive to deviate. Hence, 02 admits at least one PNE. ■ 

In both games 0i and 02, we proved that they admit PNEs. In 0i, all SBSs are 
transmitting in a PNE where some SBSs are in a collision with others whereas in 02 the 
SBSs cannot transmit in a collision state in a PNE for the case where M ^ N. This 
deteriorates the performance of the proposed games 0i and 02 as it will be shown in 
Section IVIIl 

The following two dehnitions will be used in the subsequent sections. 
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Definition 5 (Social Welfare Function); Let a be an action profile. The social welfare of a 
is the sum of the payoffs of all the players, i.e., (a). 


In other words, the social welfare function is the objective function of the optimization 
problem ([^ which is given by (2a). 


Definition 6 (Social Optimum); An action profile a is a social optimum when it maximizes 
the social welfare function. 


In other words, the social optimum is an optimal solution of the optimization problem ([^. 

The proofs of both propositions and allow us to use the simple learning rule best 
response dynamics (BRD) as it is shown to converge to PNEs. Even though games 0i and 
02 are shown to have PNEs, BRD algorithm exhibits poor performance compared to the 


social optimum solution (as shown in Section VII). Besides, it requires a huge information 
exchange between the SBSs since each SBS has to know the actions of all other players. To 
that end, we propose a better game design by reformulating the previous games. Next, we 
propose the BRD algorithm and a completely distributed algorithm in order to solve the 
new game. 


B. A Better Game Design 

In this section, we aim to improve the performance of the previously discussed games. 
Hence, the objective is to design a game that is more efficient and has a performance that is 
close to the social optimum. To that end, the user-BS association problem is reformulated 
as a non-cooperative repeated game where the players are the set of the SBSs, M. The set 
of actions of a player n G M, An, is the set of SUs plus a silence action, i.e.. An = Ad U {s}, 
where {s} corresponds to the action of not transmitting. The game can be represented in 
normal form as follows, 0 = (A/", {An}n&M, {un}n&M) where; 

• AT is the set of players, i.e., the SBSs; 

• {An}n&N is the set of actions available for player n and is given by An = Ad U {s}. 
Similarly, the actions of all the players is given by the Cartesian product A = Ai x 
AI 2 X • • • X An- The vector a = (ai,..., &n) £ A denotes an action profile of 0, where 
a„ G An', and 
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• {wnlneA/" is the set of payoffs of the players. The payoff of player n is given by the 
function Un '■ A ^ {—1,0,1}. This payoff is viewed as the gain observed by n when 
choosing to transmit to m (i.e., an = m) or when remaining silent (i.e., an = s). 

It is straightforward to see that the payoff of each player does not depend only on his own 
action but on the entire action prohle since SINK function depends on both the chosen SU 
(i.e., the channel gain of the established link) and the activated SBSs (i.e., the amount of 
interference caused to the chosen SUs). Furthermore, we have assumed that one SU cannot 
be associated to more than one SBS, and hence two players choosing the same action in Ai 
has to be penalized by receiving a negative payoff in order to prevent the collision. The SINK 
expression is given in (|^ where Sn is now equal to S'^ = (n' G Af : n' ^ n and a„/ 7 ^ s}. 
The payoff function for each player n G A/” is given explicitly by: 


'Wn(^) 



0 if Cl 
< -1 if C2 , 
1 if C3 


(9) 


\ 


The conditions Ci, C 2 , and C 3 are given below: 

• Cp a„ = s, i.e., the SBS n chooses to remain silent; 

• C 2 : (sin = m and SINRa„n (an, a_n) < (3) or (3n' 7 ^ n, a^/ = m), i.e., the SBS n chooses 
to transmit to SU m and the required SINK is not met or another SBS n' is choosing 
the same SU m; and 

• C 3 : (a„ = m and SINRa„n (an, a_n) ^ (3 and \/n' ^ n, an' 7 ^ m), i.e., the SBS n chooses 
to transmit to SU m and the required SINR is met and no other SBS n' 7 ^ n is choosing 
SU m. 

In game 0, a PNE is a feasible solution to the optimization problem ([^ that is stable, 
i.e., a subset of SBSs C AT are transmitting to a subset of SUs C Ai and no single SBS 
n & Af has the incentive to deviate from this solution because their SINR is satisfied (the 
other SBSs are silent). Note that this is not the case for 0i and 02 which have PNEs 
that is not necessarily a feasible solution for the optimization problem ([^ because of the 
collision. Unfortunately, the proposed game 0 does not have PNEs, in general, as we show 
below. 
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Proposition 3 The game 0 does not admit a PNE in general. 


Proof: We prove Proposition by constructing a counterexample, an instance of the 
game that does not admit a PNE. Let iV = M = 3, A/" = {1,2,3}, M = {1,2,3}, = 4 

for all n G {1, 2, 3} and /3 = 2. Further, assume that each SBS i can only transmit to SU i 
for all i G {1,2,3}, SBS 1 can transmit along with SBS 3 while SBS 3 cannot, SBS 2 can 
transmit along with SBS 1 while SBS 1 cannot and SBS 3 can transmit along with SBS 2 
while SBS 2 cannot. Without loss of generality, the above configuration can be transformed 
to linear system of inequalities and one solution can be given by the following matrix: 


H 


^ 1 
10 


A 

10 


1 A 1 

L 4 10 


( 10 ) 


where H = [h^n] is the channel coefficients given by h^n = |gmnp- 

We claim that this game has no PNE which can be verified by testing all the possible 
action profiles using a brute force technique, for example. Hence there are instances of 0 
where no PNE exists. Therefore, 0 does not always admit PNEs. ■ 

Even though, the game 0 is shown to have no PNEs in general, we observe through 


simulations (see Section VII) that it does admit PNE almost always. This observation is 
illustrated by the simulation of different channel realizations where all of them has PNEs. 
In other words, we claim that there exist always PNEs given the channel gain |gmnP with 
the path loss and the Rayleigh fading. 


IV. Efficiency of the Proposed Game and Discussions 

This section discusses the efficiency of the proposed game 0. It also discusses the 
performance of the games 0i and 02 and compares the convergence of the BRD and 
the mWSLS algorithm to PNEs. 


The efficiency of PNE can be measured by the PoA and the PoS 23 , 24 . We formally 
define these two ratios as follows: 


Definition 7: In the non-cooperative game 0, denote the set of PNE action profiles by 
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Fig. 2: PoA and PoS as a function of the number of players N for the case of 6 SUs and 8 
SUs. 


V* C A. Then, the PoA and the PoS are the respective ratios: 


PoA = 


N 

min X] Wn(a) 

n=l 

N 

max X; Un{si) 
aeA 


( 11 ) 


PoS = 


N 

max ^ Mn(a) 

n=l 

N 

max 'Wn(a) 
aeA 


( 12 ) 


Note that the discrete nature of the action set of the players and the random nature of 
the wireless channel make the analytic calculation in closed-form of the PoA and the PoS 
a very difficult task. For this reason, we evaluate these two ratios using computer-based 


simulations. We used Gambit tools for game theory 25 integrated with Python. 


Despite proposition simulation results show that the game 0 admits PNEs almost 
always. It is clear from Fig. that the proposed game 0 is efficient especially when the 
number of players is small. That is, the PoA and the PoS are close to 1. However, the PoA 
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Fig. 3: performance of the BRD for the game 0i and 02- M = 10 


deteriorates as the number of players gets larger. This loss in the PoA means that the set 
of worst PNEs of the game are, in average, 23% far from the social optimum when N = 5. 
Even though the game 0 has small PoA as the number of player gets larger, the PoA is still 
considered good as it is approximately equal to 77% when N = 5 and M = 8. Fortunately, 
the PoS does not suffer from such a loss as depicted in Fig. In fact, the figure shows 
that the PoS of the proposed game 0 is almost equal to 1 whatever the value of player N 
is. This illustrates that the set of best PNEs of the game 0 is almost equal to the set of 
social optimum. Therefore, if the PNE is well selected, then it must be a social optimum. 

Next, we compare the performance of games 0i and 02 to the social optimum. To that 
end, we implemented the BRD algorithm for both games. In game 0i, when the BRD 
algorithm converges, the SBSs which have a payoff of —1 should not transmit whereas, in 
game 02, the SBSs which have a payoff of —1 or a payoff of —2 (depending on the number of 
SBSs and SUs) should not transmit. The comparison between the social optimum, obtained 
by the CPLEX solver for problem (|^, and the PNEs of 0i and 02 given in Fig. This 
figure illustrates the bad performance of games 0i and 02. Despite the existence of PNEs 
in both games, these PNEs are far from the social optimum especially when N gets larger. 
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We also notice that the BRD algorithm for game 02 gives better performance compared 
to the BRD algorithm for game 0i when N is small, i.e., M > N. This is due to the fact 
that the SBSs have more options to choose from and further there is no collision as pointed 
in the proof of proposition When N gets close to M the BRD algorithm for game 02 
performs very badly and becomes clearly outperformed by the BRD algorithm for game 
01 . 

Next, we propose two strategy updating algorithms that give good performance compared 
to the social optimum. The hrst algorithm is the BRD algorithm and the second one is 
inspired by the learning rule win-stay-lose-shift [Is]. 


V. The Best Response Dynamics (BRD) 

In order to solve the game 0, we propose to use the BRD algorithm. Since the game 0 
does not admit PNEs, the BRD algorithm is not guaranteed, mathematically, to converge to 
a stable solution. The BRD algorithm starts out with a random user-BS association (random 
action prohle, a), obtained by the function hrst_round in Algorithm]^ This randomness is 
obtained by the probability distribution (see Algorithm [^. Then, BRD iterates either 
R times, where R is a maximum number of iterations, or until it converges (i.e., hnds a 
stable solution), whichever occurs hrst. The action prohle of players is determined in each 
iteration by calculating the set of best responses of each player. In each iteration, player 
n updates his action based on his previously calculated best response set. Then, player n' 
best responds accordingly and so on until all players updates their actions. The pseudo 
code of BRD is given in Algorithm 

The BRD algorithm may converge to a bad PNE due to 1) the choice made by the 
hrst_round function; or 2) the tie-breaking rule given in line 22 of Algorithm In Fig. 4a 


an instance of the game 0 is given where the PoS = 1 and the PoA =1/2. However, as shown 


in Fig. ^ the BRD algorithm converges to a PNE where 50% of the SUs are associated 
(u2 is associated and ul is not) even though the game 0 admits a PNE where 100% of the 
SUs are associated (both u2 and ul are associated). The BRD algorithm reaches this bad 


PNE because it starts with the action prohle (s, ul) as shown by the arrows in Fig. 4a, If 


it had chosen another action prohle, it would have reached the social optimum. In Fig. 4b 


the BRD algorithm has to choose between two best responses ((ul,s) and (u2,s)) starting 
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Algorithm 1 Best Response Dynamics For Game 0 

Require: M, N, pn, /3 

Ensure: A near optimal solution a* 

1 : function first_round(7rB, A/") 

> 77^: A Probability distribution 

2 : for n in A/" do 

3: SBS n plays a random action a„ according to 

4: end for 

5: for n in A/” do 

6: SBS n calculates its payoff Uni&n, a_n) 

7: end for 

8: return a = (ai, • • • , u = (ui, • • • , mat) 

9: end function 
10 : r, R 4- 0,10 

> R is a variable used to guarantee the termination of the while loop 
11 : while (r < R) or (a is not aPNE) do 

12: STi, i — 0 

13: for n in A/” do 

14: for m in An do 

15: if M„(m, a_„) > M„(an,a_„) then 

16: BTZ {m} 

17: a„ m 

18: else if Un{m,Si_n) == Mn(a„,a_„) then 

19: BTZ t— BTZ U {m} 

20: end if 

21 : end for 

22: SBS n selects an action m from BTZ 

> Tie-breaking at random 
23: a„ •(— m 

24: end for 

25: r 4- r + 1 

26: end while 
27: a* ^ a 
28: return a* 


from the action profile (s, s) as shown by the arrows. If the BRD algorithm chooses (u2, s), 
it will converge to a PNE that is 50% far from the social optimum whereas if it chooses 
(ul, s), it will converge to a PNE that is equal to the social optimum. In order to guarantee 
a tight-to-optimal solution of the BRD algorithm, it will be executed Q times with the same 
probability distribution. By re-executing the BRD algorithm Q times, it is guaranteed that 


the the best PNE will be selected more often (this will be shown in Section VII). In this 
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Fig. 4: Instance of 0 with different convergence of the BRD algorithm. 


way, the probability that the BRD algorithm converges to a social optimum will approach 
one. However, this solution suffers from high information exchange and is computationally 
complex. Therefore, in the next section, we propose a completely distributed algorithm that 
has a better trade-off between computation, information exchange and performance. 

VI. The Modihed Win-Stay-Lose-Shift Algorithm (mWSLS) 

In this section we describe a completely distributed algorithm. It is called mWSLS for 
modihed win-stay-lose-shift. This algorithm is inspired by the well-known win-stay-lose- 
shift learning algorithm [^. We hrst describe the hrst iteration of the algorithm and its 
parameters in a pseudo-code and then describe the learning process. 

A. Overview and the First Iteration 

Each SBS n runs the same learning algorithm. It starts by playing randomly an action 
a„ from its action space An- Then, it starts transmitting to the chosen SU a^ = m. The 
SU m then computes its SINR and feeds back to the SBS n whether the computed SINR 
is above the threshold or not. Specihcally, the SU m sends one bit of data which allows the 
SBS n to have an information about how good its choice is. Clearly, this feedback depends 
not only on the action of the SBS n but on the entire action prohle of all the SBSs played 
during this hrst iteration. Hence, the SBS n computes its reward based on the feedback as 
follows: 
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f 0 if an = s 


n) \ 


-1 if b = 0 


1 if b = 1 , 


(13) 


where b is the feedback of the chosen SU m = Note that to prevent the colliding state, 
where two or more SBSs choose the same SU, each SBS broadcasts its chosen action in the 
network. 

Based on the received feedback and the computed rewards, the SBS that chooses to 
transmit during the current slot have an information about the quality of the chosen 
action. Whereas the SBS which chooses to remain silent during that slot has no new 
information. Hence, every time a SBS learns something new about the system, it must 
exploit it in order to play a better action in the future. Therefore, we associate to each 
SBS n G Af, a probability vector = [7 r„[l],..., 7r„[M],7r„[s]]'^ of size M + 1 . Each element 
7 r„[m] corresponds to the probability of playing action a^ = m by SBS n. For the hrst 
iteration, we assume that the probabilities in each vector 7 r„ are uniformly distributed, i.e., 
7r„[m] = 4 ^,Vm G {!,... ,M} U {s}. Note that each SBS picks its first action according 
to this distribution. 


B. The Learning Process 

Once the SBSs acquire the feedback of the corresponding SUs and map them into rewards, 
they proceed by updating their probability vectors. These updates are the core element of 
the learning process. In the original WSLS algorithm, each player starts by playing a random 
action. If the played action results in a higher payoff, then it is considered as a winning 
action and the player keeps playing it during the next round. Otherwise, this action is 
considered as a loosing one and the player has to shift into another action with the hope 
of improving its new reward. The WSLS is widely and efficiently applied when the rewards 
are Boolean (either 0 or 1). However, such learning strategy must be adapted when the 
rewards are finite but not Boolean which is the case in this paper where the rewards are 
in {- 1 , 0 , 1 }. 

When the reward is equal to 1, the action played in the current iteration is considered as 
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winning action. Hence, the probability of playing it must be increased in order to augment 
the chances to converge to this action at the end of the learning process. Precisely, the 
probability corresponding to the winning action is updated as follows 

T^i^^[m] = Trn[m]+T{l-%^^[m]) , (14) 

where m is the index of the winning action, t is the iteration index[^ and r represents the 
winning increment factor by which the probability of choosing the winning action during 
the next iteration is augmented. Note that all the probabilities in 7r„ other than 7r„[m] 
should be reduced by the same factor in order to keep their sum (including 7r„[m]) equal 
to one. Hence, these probabilities are updated as follows: 

= 7ih[m'] — TTil^[m'], Vm' ^ m. (15) 

When the reward is equal to —1, the action chosen in the current iteration is a loosing one. 
The probability of playing such action in the next iteration is reduced and the probability 
of remaining silent (i.e., playing a action s) is increased. This learning strategy is motivated 
by the fact that when a SBS plays many loosing actions, it is then preferable to force it to 
learn playing the silence action. The probability vector is therefore updated as follows: 


= KM - e. 

+£. 


(16) 

(17) 


where m is the index of the loosing action, s is the index of the silent action and e represents 
the loosing decrement factor. When the algorithm performs the mentioned updates, it should 
always check that the probabilities remain consistent, i.e., (0 < 7r„[m] < 1, Vn,m) and 
iJ2m^n[m] = 1, Vn). 

After performing a given number of iterations, each SBS learns to play either a winning 
action which allows it to obtain a positive reward or the silence action. All SBSs that are 
chosen to transmit converge to a steady state where the probability vectors contain a single 


2 The iteration index is only used when necessary 
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Algorithm 2 Pseudo-code of the part of the mWSLS algorithm executed in each SBS 

Require: Am £, r, 7r„ 


Ensure: An action a„ 



> 7r„ is a probability vector given by = [7r[l,n],. 

.,7r[M, n],7r[s,n]]^ 

1 

if Uni&n, •) == 1 then 


2 

7r[a„, n] <i^T + 7r[a„, n] ■ (1 - r) 


3 

tmp e- X\{an} 


4 

7r[tmp, n] = 7r[tmp, n] ■ (1 — r) 


5 

else if Un{sin, ■) == —1 then 


6 

7r[an,n] ^ 7r[a„,n] - e: 


7 

7r[s, n] = 7r[s, n]+ e 


8 

end if 


9 

return a„ 



one positioned in the winning action corresponding to the associated SU. Whereas the SBSs 
which converge with a probability vector containing one in the index s has to remain silent 
during the current time-slot. The pseudo code of the mWSLS algorithm executed in each 
SBS is given by Algorithm 


VII. Simulation Results 

In this section, the performance of the proposed learning algorithm mWSLS and the 
BRD algorithm obtained by simulations are presented. We first investigate the impact of 
the different parameters of both algorithms on their performance. Then, we present the 
efficiency of the mWSLS and the BRD and compare them against the social optimum 
solution in terms of average number of associated users. 

For all the simulations, the coefficient gmn is given mathematically by Pmn\ where 

y ^ ^mn ' 

a is the path loss coefficient, dmn is the distance between m and n, uniformaly generated, 
do is a reference distance at which the path loss is calculated and pmn is the long-term 
fading modeled as a zero-mean, complex Gaussian random variable with unit variance. 
Unless otherwise specified, the SINR threshold required by each SU is set to /3 = 0 dB, 
the transmit power at each SBS n is fixed to pn = 10 dB (this power is normalized by the 
noise power and the reference distance do) and the path loss exponent is set to a = 4. The 
long-term fading effect is assumed to be Rayleigh fading. The number of iterations of the 
mWSLS is chosen to be T = 100. The results of all the performed simulations are averaged 
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over 5000 channel realization. 
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Fig. 5; Average number of associated SUs vs. the number of re-execution of BRD algorithm 
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On the one hand, the most important parameter of the BRD algorithm is the number of 
times it has been executed, Q. In Fig. we compare, for a hxed number of SBSs = 10 
and for a hxed number of SUs M = 6 and M = 10, the mWSLS and the social optimum 
against the BRD algorithm. The curve of the BRD algorithm in Fig. is obtained by 
executing the BRD algorithm Q times where is the uniform distribution. 

Fig. 1^ shows that when Q is small, the BRD algorithm performs worst than the mWSLS 
algorithm and is far from the social optimum. This is due to the bad hrst association given 


by the hrst round of the algorithm as explained in Fig. On the other hand, when Q 
increases, the performance of the BRD algorithm improves and outperforms the proposed 
mWSLS algorithm. This is essentially because the BRD algorithm would explore further the 
set of PNEs and hence, with high probability, the BRD algorithm approaches the best PNE 
with high probability. We mention that when the action space is large, the BRD algorithm 
outperforms the mWSLS algorithm with less Q. (Less than 30 re-executions are needed for 
M = 10 compared to 50 re-executions for M = 6.) 
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Fig. 7: Average number of associated SUs vs. T 


The performance and the convergence of the mWSLS algorithm rely on the choice of 
three parameters, namely the winning increment factor r, the loosing decrement factor e: 
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Fig. 8: Average number of associated SUs vs. £ 


and the number of iterations T. Hence, we discuss here their impact on the performance. 
Fig. plots the average number of associated SUs as a function of the factor r. We notice 
that the impact of r on the performance increases as the number of SBSs increases. Also, it 
can be observed that the optimal value for the three scenarios is around 0.1. However, we 
can notice that the value of optimal r tends to get smaller as N gets larger. The number 
of associated SUs as a function of the iteration index for different values of r are shown in 
Fig. HI We notice that the optimal r results in higher performance but suffers from very 
slow convergence. Whereas a large r achieves poor performance but converges very quickly, 
e.g., r = 0.3 achieves a performance of serving three SUs in less than 15 iterations while the 
same performance is obtained after more than a hundred iterations with r = 0.1. Hence, 
the factor r can be seen as a tuning parameter to adjust the performance/convergence time 
trade-off. Fig. plots the algorithm performance as a function of the factor e for different 
numbers of SBSs T. The figure shows that the optimal value of e is around 0.01 and has a 
noticed impact on the performance of the learning. This impact becomes important when 
N gets larger. This is because when there are more SBSs in the network, the number of 
silent SBSs is more important. 
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Fig. 10; Performance comparison of mWSLS, BRD with Q = 30 and social optimum. 


Fig-i presents the percentage of time the mWSLS algorithm converges to a PNE. As 
stated before, there exist no known completely distributed algorithm that ensures to converge 
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to such an equilibrium. However, Fig. shows that the mWSLS algorithm allows the SBSs 
to play a PNE for a long period of time, especially for small N. Also, even when N is equal 
to 10 the algorithm converges to a PNE for more than 95% of the time. 

In Fig. we present the performance of the mWSLS algorithm against the optimal 
solution, derived by the branch-and-bound algorithm (CPLEX), and the BRD for game 0. 
Both BRD and mWSLS are close to the social optimum even for large values of M and N. 
Further, the performance gap between the two strategy updating algorithm is still the same 
when N gets larger. Hence, this small gap suggests the use of the completely distributed 
algorithm mWSLS in practice. 

VIII. Conclusion 

This paper studies the user-BS association problem under quality of service requirements 
in HetSNets. Centralized and optimal algorithms suffer from many implementation issues. 
On the one hand, they need a large amount of feedback which relies on resource consuming 
information exchange between the SBSs and the SUs. Furthermore, they suffer from a huge 
computational complexity since the optimized problem is NP-hard. Thus, in this paper, we 
modeled this problem using non-cooperative game theory. First, we showed that the two 
hrst formulated games admit PNE but suffer from bad performance. Hence, we designed and 
studied a better game model. We showed that even though the game does not admit a PNE 
in general settings, but when it does then it has a PoA and PoS very close to 1. Next, we 
proposed the BRD algorithm to solve the game. We also proposed a completely distributed 
algorithm which assumes no coordination between the SBSs. This algorithm is based on 
the win-stay-lose-shift learning model (called mWSLS). Through simulation, we assessed 
the performance of the proposed game and the proposed algorithms. Also, the mWSLS 
algorithm is shown to converge to a near optimal user association solution approaching the 
BRD algorithm and the complex centralized performance obtained by a computationally 
complex algorithm which assumes complete channel information. Furthermore, the mWSLS 
algorithm converges to PNE with high probability. 
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